26 research outputs found

    An experiment in audio classification from compressed data

    Get PDF
    In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content. The proposed algorithm allows classification directly on MPEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data. The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed

    Rhythm detection for speech-music discrimination in MPEG compressed domain

    Get PDF
    A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced. Rhythmic pulses are detected by applying a long-term autocorrelation method on band-passed signals. This approach is combined with another, in which the features describe the energy peaks of the signal. The discriminator uses just three features that are computed from data directly taken from an MPEG-1 bitstream. The discriminator was tested on more than 3 hours of audio data. Average recognition rate is 97.7%

    MPEG-1 bitstreams processing for audio content analysis

    Get PDF
    In this paper, we present the MPEG-1 Audio bitstreams processing work which our research group is involved in. This work is primarily based on the processing of the encoded bitstream, and the extraction of useful audio features for the purposes of analysis and browsing. In order to prepare for the discussion of these features, the MPEG-1 audio bitstream format is first described. The Application Interface Protocol (API) which we have been developing in C++ is then introduced, before completing the paper with a discussion on audio feature extraction

    Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

    Get PDF
    This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with the state-of-the-art method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the 'MediaEval2015 emotion in music' dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.Comment: Accepted for Sound and Music Computing (SMC 2017

    Speech-music discrimination from MPEG-1 bitstream

    Get PDF
    This paper describes a proposed algorithm for speech/music discrimination, which works on data directly taken from MPEG encoded bitstream thus avoiding the computationally difficult decoding-encoding process. The method is based on thresholding of features derived from the modulation envelope of the frequency-limited audio signal. The discriminator is tested on more than 2 hours of audio data, which contain clean and noisy speech from several speakers and a variety of music content. The discriminator is able to work in real time and despite its simplicity, results are very promising

    Voice Operated Information System in Slovak

    Get PDF
    Speech communication interfaces (SCI) are nowadays widely used in several domains. Automated spoken language human-computer interaction can replace human-human interaction if needed. Automatic speech recognition (ASR), a key technology of SCI, has been extensively studied during the past few decades. Most of present systems are based on statistical modeling, both at the acoustic and linguistic levels. Increased attention has been paid to speech recognition in adverse conditions recently, since noise-resistance has become one of the major bottlenecks for practical use of speech recognizers. Although many techniques have been developed, many challenges still have to be overcome before the ultimate goal -- creating machines capable of communicating with humans naturally -- can be achieved. In this paper we describe the research and development of the first Slovak spoken language dialogue system. The dialogue system is based on the DARPA Communicator architecture. The proposed system consists of the Galaxy hub and telephony, automatic speech recognition, text-to-speech, backend, transport and VoiceXML dialogue management modules. The SCI enables multi-user interaction in the Slovak language. Functionality of the SLDS is demonstrated and tested via two pilot applications, ``Weather forecast for Slovakia'' and ``Timetable of Slovak Railways''. The required information is retrieved from Internet resources in multi-user mode through PSTN, ISDN, GSM and/or VoIP network

    Dublin City University video track experiments for TREC 2002

    Get PDF
    Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems

    Discriminative feature selection for applause sounds detection

    No full text
    The specific sounds such as applause, laughter, explosions, etc. are very helpful to understand high level semantic of the audio/video content. The paper focuses on feature selection by evolutional programming for automatic detection of applause in audio stream. A set of the most discriminative features is selected by Genetic Algorithm and Simulated Annealing. The experiments are run on more than 9 hours of audio selected from various audio and video content. The results show that the applause sound recognition improves if only a few coefficients are selected from MFCC static and dynamic features. Further, the delta-delta coefficients (the 2nd time derivates of MFCCs) highly outperform the delta coefficients. 1

    Development of a System for Automatic Recognition of Speech

    No full text
    The article gives a review of a research on processing and automatic recognition of speech signals (ARR) at the Department of Telecommunications of the Faculty of Electrical Engineering, University of Zilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition
    corecore